The Library of Congress updates their Recommended Formats Statement regularly. This is a helpful quick reference for selecting a format that is stable when there is an opportunity to choose. If converting data from a proprietary format to an open file format results in some data loss, consider saving both. For less established or proprietary formats, consider recording the type, version, and software used to generate and play the file—this can be included in the metadata or documentation.
These guidelines may also be considered during file format selection:
13. Acquire the highest quality version of media to use for preservation
34. For EPUBs, opt for core media types, as defined by the EPUB specification
Dynamic maps such as those generated with Google Maps, consist of many smaller map tiles that are loaded on the fly as users pan and zoom. Web crawlers cannot easily capture this experience, nor can this be exported. If the map is not the focal point of the work and is being used to present a small number of locations, consider using one or more still images. Display the place name and coordinates for the pin in the caption and provide a link to a live map.
These guidelines offer alternative ways to manage dynamic map features:
16. Captions add important context to non-text features
53. Consider web page designs that pre-load all data when the page loads
For custom websites or software, publishers should request an installation script from the authors or developers. This can be used in combination with a clean installation package (one that is unpolluted by extraneous files and data generated in the live environment during deployment and use) to install the software or website in a new environment. In addition to the install script, the authors or developers should provide a document listing the machine requirements and any dependencies that will be installed or used by the script. If a script is not available, at minimum the authors or developers should provide documentation that describes the requirements, dependencies, and detailed installation process with sample commands as appropriate. This information can be placed in a README file placed in the root of the project. While installation scripts may stop working as technology evolves, they provide information about how to get the software working and can be vital context for a preservation service, or when migrating to new infrastructure.
These guidelines also discuss the installation package for a web application:
61. Create installation packages for custom websites that don’t require a live server
62. Create installation packages for custom websites that do require a live server
67. Keep the source code and compiled version of the software
Data visualizations tend to be a particular arrangement of one or more raw datasets. Data visualization formats can obscure parts of the underlying data that they are derived from. They may also be compiled or complex. All of these properties could potentially make the data difficult to open, validate, or comprehend in the future. To preserve a publication in which data visualizations are core intellectual components, request underlying raw data from the author. Request supporting documentation that would enable a future reader to retrace the author's steps from the raw data to the visualization. Images or videos of the visualization may also be helpful for recreating it. For both visualization and raw data formats, as with all supplements, ideally the files will be an open or broadly adopted format. The Library of Congress Recommended Formats Statement can help with selecting formats. In the case of vector data, for example, there is not a broadly adopted open format, but Shapefile, while proprietary, is broadly adopted and openly documented. There are a variety of tools that can read Shapefiles which increases the likelihood that it will continue to be supported in some form.
These guidelines may also be relevant when considering preservation of data visualizations:
11. Use non-proprietary, broadly supported and adopted open file formats
57. Use alternative approaches for features that require communication with a server
64. Use meaningful file names and field names in your data, supply documentation
For data that is to be preserved as part of, or as a supplement to a publication, provide metadata and documentation that explains the context of the data, and the meaning and limits of each file or field. In the absence of strong documentation, using file and field names that convey some meaning can be helpful to support data reuse. For example, a field named “otherstuff_12” is less useful than “weight_in_kg.”
Do not send administrative data to a preservation archive unless it is integral to the work. For example, when exporting a SQL database, you may need to exclude or anonymize the content from user tables, indexes that support a specific UI, non-public communications, or logs. Only archive the essential data that can be made publicly accessible.
These guidelines refer to the creation of the installation package:
61. Create installation packages for custom websites that don’t require a live server
62. Create installation packages for custom websites that do require a live server
For data, software, or any resource that has a complex arrangement of files, if structured metadata cannot be supplied, a common convention is to include a README file from the author. Written using a plain text file format, this should be a note to future users who wish to use the files. It should include information such as, scope, purpose, author(s), relevant dates, license for reuse, dependencies, field names/descriptions, and instructions for use.
See also:
68. Provide documentation for software
Compiled software may be opaque or impossible to modify, while source code may be impossible to compile if build dependencies become unavailable. Supplying both can enable different preservation pathways. If compiled software can no longer run due to an incompatible operating system, it may be possible to match it with an appropriate emulator. Source code provides future users with an opportunity to understand what the software is if the documentation is insufficient and may also allow modifications to the software to work in a different environment or context. Ensure that the software license is expressed within the package and appropriate for reuse.
These guidelines refer to the creation of the installation package:
60. Request an installation script for custom software
61. Create installation packages for custom websites that don’t require a live server
62. Create installation packages for custom websites that do require a live server
Consider what a future user of the software might need to know to run the software and understand how it should work. Ensure this is covered by the documentation. For example, what is the software for? What are the supported operating systems and versions? Are there any dependencies or requirements? How do you install it? How do you use it? What should it do if it is working? What is its license? In the case where software is not possible to preserve, visual and narrative documentation of the user experience can provide vital context.
This guideline refers to another common method for documenting software:
66. Use a README file to document data or software